Star Hotels Project

Context

A significant number of hotel bookings are called-off due to cancellations or no-shows. The typical reasons for cancellations include change of plans, scheduling conflicts, etc. This is often made easier by the option to do so free of charge or preferably at a low cost which is beneficial to hotel guests but it is a less desirable and possibly revenue-diminishing factor for hotels to deal with. Such losses are particularly high on last-minute cancellations.

The new technologies involving online booking channels have dramatically changed customers’ booking possibilities and behavior. This adds a further dimension to the challenge of how hotels handle cancellations, which are no longer limited to traditional booking and guest characteristics.

The cancellation of bookings impact a hotel on various fronts:

Objective

The increasing number of cancellations calls for a Machine Learning based solution that can help in predicting which booking is likely to be canceled. Star Hotels Group has a chain of hotels in Portugal, they are facing problems with the high number of booking cancellations and have reached out to your firm for data-driven solutions. You as a data scientist have to analyze the data provided to find which factors have a high influence on booking cancellations, build a predictive model that can predict which booking is going to be canceled in advance, and help in formulating profitable policies for cancellations and refunds.

Data Description

The data contains the different attributes of customers' booking details. The detailed data dictionary is given below.

Data Dictionary

Importing necessary libraries and data

Read the dataset

Check for any missing data

Check the info of the data

Check for duplicates

Let's drop the duplicate values

Lets check the different unique values for the object columns

Change the data type to be categorical

EDA

Functions for plotting

Univariate Analysis

arrival_month
arrival_year
arrival_date
type_of_meal_plan
room_type_reserved
market_segment_type
booking_status
Add required_car_parking_space and repeated_guest columns to category type
required_car_parking_space
repeated_guest
'no_of_adults' & 'no_of_children'
'no_of_weekend_nights' & 'no_of_week_nights'

EDA for Numeric columns

lead_time
avg_price_per_room
no_of_special_requests

Bivariate Analysis

'no_of_week_nights' Vs 'booking_status'
'no_of_adults' Vs 'booking_status'
'no_of_children','booking_status'
'no_of_weekend_nights' Vs 'booking_status
'no_of_week_nights' Vs 'booking_status'
'type_of_meal_plan' Vs 'booking_status'
'market_segment_type' Vs 'booking_status'
'repeated_guest' Vs 'booking_status'
'no_of_special_requests' Vs 'booking_status'
'no_of_previous_cancellations' Vs 'booking_status'
'lead_time' Vs 'booking_status'
'avg_price_per_room' Vs 'booking_status'
'avg_price_per_room' Vs 'room_type_reserved' Vs 'booking_status'
'avg_price_per_room' Vs 'type_of_meal_plan' Vs 'booking_status'
'avg_price_per_room' Vs 'market_segment_type'

Data Overview

Questions:

  1. What are the busiest months in the hotel?
  2. Which market segment do most of the guests come from?
  3. Hotel rates are dynamic and change according to demand and customer demographics. What are the differences in room prices in different market segments?
  4. What percentage of bookings are canceled?
  5. Repeating guests are the guests who stay in the hotel often and are important to brand equity. What percentage of repeating guests cancel?
  6. Many guests have special requirements when booking a hotel room. Do these requirements affect booking cancellation?

Data Preprocessing

Outlier detection

lead_time - Outlier treatment
avg_price_per_room - Outlier treatment

Binning - avg_price_per_room

Create bin for the used price

Preparing data for modeling

Feature engineering

Add Day of the week

Add total_no_of_person

Add total_no_of_days

Change day_of_week to be category

Questions:

  1. What are the busiest months in the hotel?
  2. Which market segment do most of the guests come from?
  3. Hotel rates are dynamic and change according to demand and customer demographics. What are the differences in room prices in different market segments?
  4. What percentage of bookings are canceled?
  5. Repeating guests are the guests who stay in the hotel often and are important to brand equity. What percentage of repeating guests cancel?
  6. Many guests have special requirements when booking a hotel room. Do these requirements affect booking cancellation?
  1. What are the busiest months in the hotel?
  1. What are the busiest months in the hotel? Ans : March ,it has the highest non cancellations. Aug, May and Apr as well.
  1. Which market segment do most of the guests come from?
  1. Which market segment do most of the guests come from? Ans : Online
  1. Hotel rates are dynamic and change according to demand and customer demographics. What are the differences in room prices in different market segments?
  1. Hotel rates are dynamic and change according to demand and customer demographics. What are the differences in room prices in different market segments? Ans : Avaiation : Mostly between 90 and 110.
  1. What percentage of bookings are canceled?
  1. What percentage of bookings are canceled? Ans : At about 34% of booking are canceled.
  1. Repeating guests are the guests who stay in the hotel often and are important to brand equity. What percentage of repeating guests cancel?
  1. Repeating guests are the guests who stay in the hotel often and are important to brand equity. What percentage of repeating guests cancel? Ans: Less than 1% of repeated_guest cancel the booking
  1. Many guests have special requirements when booking a hotel room. Do these requirements affect booking cancellation?
  1. Many guests have special requirements when booking a hotel room. Do these requirements affect booking cancellation? Ans: The more special requirements, the less likely they are to cancel
Let's take the relevant columns for the model

Model evaluation criterion

Model can make wrong predictions as:

  1. Predicting a booking is not going to be Canceled but in reality is Canceled - Loss of resources
  2. Predicting a booking is going to Cancel but in reality will not Canceled -Wrong flagging

Which Loss is greater ?

How to reduce this loss i.e need to reduce False Negatives ?

Positive event and Negative Event

Data Preparation

Checking Multicollinearity

Building a Logistic Regression model

Logistic Regression (with Sklearn library)

Checking performance on training set

Checking performance on test set

Observations

Logistic Regression (with statsmodels library)

logit sm model

Remove the hign VIF columns
None of the columns have high VIF

Observations

Now no feature has p-value greater than 0.05, so we'll consider the features in X_train3 as the final ones and lg3 as final model.

Converting coefficients to odds

Coefficient interpretations

Checking model performance on the training set

ROC-AUC

Model Performance Improvement

Checking model performance on training set

Checking model performance on training set

Model performance evaluation

Model performance evaluation

Final Model Summary

Building a Decision Tree model

First, let's create functions to calculate different metrics and confusion matrix

Checking model performance on test set

Visualizing the Decision Tree

Do we need to prune the tree?

Yes, Using GridSearch for Hyperparameter tuning of our tree model.

The re-call score has a huge difference on training and test.

Reducing over fitting

Checking performance on training set

Checking performance on test set

pre-pruning 2

Checking performance on training set

Checking performance on test set

Cost Complexity Pruning

For the remainder, we remove the last element in clfs and ccp_alphas, because it is the trivial tree with only one node.

Creating model with 0.000000001 ccp_alpha

Model Performance Comparison and Conclusions

DecisionTreeClassifier(class_weight={0: 0.34, 1: 0.66}, max_depth=3, max_leaf_nodes=5, min_impurity_decrease=0.001, random_state=1)

Conclusions

Actionable Insights and Recommendations

What profitable policies for cancellations and refunds can the hotel adopt?
What other recommedations would you suggest to the hotel?